vision and pattern recognition workshop
Statistical Physics of Deep Neural Networks: Generalization Capability, Beyond the Infinite Width, and Feature Learning
Deep Neural Networks (DNNs) excel at many tasks, often rivaling or surpassing human performance. Yet their internal processes remain elusive, frequently described as "black boxes." While performance can be refined experimentally, achieving a fundamental grasp of their inner workings is still a challenge. Statistical Mechanics has long tackled computational problems, and this thesis applies physics-based insights to understand DNNs via three complementary approaches. First, by averaging over data, we derive an asymptotic bound on generalization that depends solely on the size of the last layer, rather than on the total number of parameters -- revealing how deep architectures process information differently across layers. Second, adopting a data-dependent viewpoint, we explore a finite-width thermodynamic limit beyond the infinite-width regime. This leads to: (i) a closed-form expression for the generalization error in a finite-width one-hidden-layer network (regression task); (ii) an approximate partition function for deeper architectures; and (iii) a link between deep networks in this thermodynamic limit and Student's t-processes. Finally, from a task-explicit perspective, we present a preliminary analysis of how DNNs interact with a controlled dataset, investigating whether they truly internalize its structure -- collapsing to the teacher -- or merely memorize it. By understanding when a network must learn data structure rather than just memorize, it sheds light on fostering meaningful internal representations. In essence, this thesis leverages the synergy between Statistical Physics and Machine Learning to illuminate the inner behavior of DNNs.
- North America > United States > New York > New York County > New York City (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (15 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- Information Technology (0.92)
- Law Enforcement & Public Safety > Fraud (0.67)
- (5 more...)
International AI Safety Report
Bengio, Yoshua, Mindermann, Sören, Privitera, Daniel, Besiroglu, Tamay, Bommasani, Rishi, Casper, Stephen, Choi, Yejin, Fox, Philip, Garfinkel, Ben, Goldfarb, Danielle, Heidari, Hoda, Ho, Anson, Kapoor, Sayash, Khalatbari, Leila, Longpre, Shayne, Manning, Sam, Mavroudis, Vasilios, Mazeika, Mantas, Michael, Julian, Newman, Jessica, Ng, Kwan Yee, Okolo, Chinasa T., Raji, Deborah, Sastry, Girish, Seger, Elizabeth, Skeadas, Theodora, South, Tobin, Strubell, Emma, Tramèr, Florian, Velasco, Lucia, Wheeler, Nicole, Acemoglu, Daron, Adekanmbi, Olubayo, Dalrymple, David, Dietterich, Thomas G., Felten, Edward W., Fung, Pascale, Gourinchas, Pierre-Olivier, Heintz, Fredrik, Hinton, Geoffrey, Jennings, Nick, Krause, Andreas, Leavy, Susan, Liang, Percy, Ludermir, Teresa, Marda, Vidushi, Margetts, Helen, McDermid, John, Munga, Jane, Narayanan, Arvind, Nelson, Alondra, Neppel, Clara, Oh, Alice, Ramchurn, Gopal, Russell, Stuart, Schaake, Marietje, Schölkopf, Bernhard, Song, Dawn, Soto, Alvaro, Tiedrich, Lee, Varoquaux, Gaël, Yao, Andrew, Zhang, Ya-Qin, Albalawi, Fahad, Alserkal, Marwan, Ajala, Olubunmi, Avrin, Guillaume, Busch, Christian, de Carvalho, André Carlos Ponce de Leon Ferreira, Fox, Bronwyn, Gill, Amandeep Singh, Hatip, Ahmet Halit, Heikkilä, Juha, Jolly, Gill, Katzir, Ziv, Kitano, Hiroaki, Krüger, Antonio, Johnson, Chris, Khan, Saif M., Lee, Kyoung Mu, Ligot, Dominic Vincent, Molchanovskyi, Oleksii, Monti, Andrea, Mwamanzi, Nusu, Nemer, Mona, Oliver, Nuria, Portillo, José Ramón López, Ravindran, Balaraman, Rivera, Raquel Pezoa, Riza, Hammam, Rugege, Crystal, Seoighe, Ciarán, Sheehan, Jerry, Sheikh, Haroon, Wong, Denise, Zeng, Yi
I am honoured to present the International AI Safety Report. It is the work of 96 international AI experts who collaborated in an unprecedented effort to establish an internationally shared scientific understanding of risks from advanced AI and methods for managing them. We embarked on this journey just over a year ago, shortly after the countries present at the Bletchley Park AI Safety Summit agreed to support the creation of this report. Since then, we published an Interim Report in May 2024, which was presented at the AI Seoul Summit. We are now pleased to publish the present, full report ahead of the AI Action Summit in Paris in February 2025. Since the Bletchley Summit, the capabilities of general-purpose AI, the type of AI this report focuses on, have increased further. For example, new models have shown markedly better performance at tests of Professor Yoshua Bengio programming and scientific reasoning.
- South America (1.00)
- North America > Canada (1.00)
- Asia > Middle East (1.00)
- (7 more...)
- Research Report > Promising Solution (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- (5 more...)
- Transportation > Air (1.00)
- Social Sector (1.00)
- Media > News (1.00)
- (30 more...)
Reviews: Characterizing Bias in Classifiers using Generative Models
Originality: The task of characterizing biases in face classification systems has recently received increasing attention from researchers. While related work either use computer graphics or real-world data, here, the authors propose to use conditional GANs. Related work is mostly adequately cited, although I would recommend to also take the following related computer graphics approaches into account: - Qiu, Weichao, and Alan Yuille. Quality: The claims made are supported by empirical analyses, although the experimental setting is rather limited, because only two classifiers have been tested. The limitations of GANs in terms of generating realistic images have been pointed out adequately.
On Uncertainty In Natural Language Processing
The last decade in deep learning has brought on increasingly capable systems that are deployed on a wide variety of applications. In natural language processing, the field has been transformed by a number of breakthroughs including large language models, which are used in increasingly many user-facing applications. In order to reap the benefits of this technology and reduce potential harms, it is important to quantify the reliability of model predictions and the uncertainties that shroud their development. This thesis studies how uncertainty in natural language processing can be characterized from a linguistic, statistical and neural perspective, and how it can be reduced and quantified through the design of the experimental pipeline. We further explore uncertainty quantification in modeling by theoretically and empirically investigating the effect of inductive model biases in text classification tasks. The corresponding experiments include data for three different languages (Danish, English and Finnish) and tasks as well as a large set of different uncertainty quantification approaches. Additionally, we propose a method for calibrated sampling in natural language generation based on non-exchangeable conformal prediction, which provides tighter token sets with better coverage of the actual continuation. Lastly, we develop an approach to quantify confidence in large black-box language models using auxiliary predictors, where the confidence is predicted from the input to and generated output text of the target model alone.
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- North America > United States > New York > New York County > New York City (0.13)
- Europe > Denmark > Capital Region > Copenhagen (0.13)
- (71 more...)
- Summary/Review (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- (3 more...)
- Transportation (1.00)
- Law (1.00)
- Information Technology (1.00)
- (5 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- (11 more...)
Large-scale Weakly Supervised Learning for Road Extraction from Satellite Imagery
Meng, Shiqiao, Di, Zonglin, Yang, Siwei, Wang, Yin
Automatic road extraction from satellite imagery using deep learning is a viable alternative to traditional manual mapping. Therefore it has received considerable attention recently. However, most of the existing methods are supervised and require pixel-level labeling, which is tedious and error-prone. To make matters worse, the earth has a diverse range of terrain, vegetation, and man-made objects. It is well known that models trained in one area generalize poorly to other areas. Various shooting conditions such as light and angel, as well as different image processing techniques further complicate the issue. It is impractical to develop training data to cover all image styles. This paper proposes to leverage OpenStreetMap road data as weak labels and large scale satellite imagery to pre-train semantic segmentation models. Our extensive experimental results show that the prediction accuracy increases with the amount of the weakly labeled data, as well as the road density in the areas chosen for training. Using as much as 100 times more data than the widely used DeepGlobe road dataset, our model with the D-LinkNet architecture and the ResNet-50 backbone exceeds the top performer of the current DeepGlobe leaderboard. Furthermore, due to large-scale pre-training, our model generalizes much better than those trained with only the curated datasets, implying great application potential.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.05)
- North America > United States > Massachusetts (0.05)
- (9 more...)
Breaking Through the Haze: An Advanced Non-Homogeneous Dehazing Method based on Fast Fourier Convolution and ConvNeXt
Zhou, Han, Dong, Wei, Liu, Yangyi, Chen, Jun
Haze usually leads to deteriorated images with low contrast, color shift and structural distortion. We observe that many deep learning based models exhibit exceptional performance on removing homogeneous haze, but they usually fail to address the challenge of non-homogeneous dehazing. Two main factors account for this situation. Firstly, due to the intricate and non uniform distribution of dense haze, the recovery of structural and chromatic features with high fidelity is challenging, particularly in regions with heavy haze. Secondly, the existing small scale datasets for non-homogeneous dehazing are inadequate to support reliable learning of feature mappings between hazy images and their corresponding haze-free counterparts by convolutional neural network (CNN)-based models. To tackle these two challenges, we propose a novel two branch network that leverages 2D discrete wavelete transform (DWT), fast Fourier convolution (FFC) residual block and a pretrained ConvNeXt model. Specifically, in the DWT-FFC frequency branch, our model exploits DWT to capture more high-frequency features. Moreover, by taking advantage of the large receptive field provided by FFC residual blocks, our model is able to effectively explore global contextual information and produce images with better perceptual quality. In the prior knowledge branch, an ImageNet pretrained ConvNeXt as opposed to Res2Net is adopted. This enables our model to learn more supplementary information and acquire a stronger generalization ability. The feasibility and effectiveness of the proposed method is demonstrated via extensive experiments and ablation studies. The code is available at https://github.com/zhouh115/DWT-FFC.
- North America > Canada > Ontario > Hamilton (0.14)
- North America > Mexico > Gulf of Mexico (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
A Comprehensive Survey and Taxonomy on Single Image Dehazing Based on Deep Learning
Gui, Jie, Cong, Xiaofeng, Cao, Yuan, Ren, Wenqi, Zhang, Jun, Zhang, Jing, Cao, Jiuxin, Tao, Dacheng
The phenomenon of image quality degradation in hazy weather has a negative impact on photography work. The contrast of the image will decrease and the color will shift. Meantime, the texture and edge of objects in the scene will become blurred. As shown in Figure 1, there is an obvious difference between the pixel histograms of hazy and haze-free images. For computer vision tasks such as object detection and image segmentation, low-quality inputs can degrade the performance of the models trained on haze-free images. Therefore, many researchers try to recover high-quality clear scenes from hazy images. Before deep learning was widely used in computer vision tasks, image dehazing algorithms had mainly relied on various prior assumptions [51] Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
- Asia > China (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Research Report (1.00)
- Overview (1.00)
PyNet-V2 Mobile: Efficient On-Device Photo Processing With Neural Networks
Ignatov, Andrey, Malivenko, Grigory, Timofte, Radu, Tseng, Yu, Xu, Yu-Syuan, Yu, Po-Hsiang, Chiang, Cheng-Ming, Kuo, Hsien-Kai, Chen, Min-Hung, Cheng, Chia-Ming, Van Gool, Luc
The increased importance of mobile photography created a need for fast and performant RAW image processing pipelines capable of producing good visual results in spite of the mobile camera sensor limitations. While deep learning-based approaches can efficiently solve this problem, their computational requirements usually remain too large for high-resolution on-device image processing. To address this limitation, we propose a novel PyNET-V2 Mobile CNN architecture designed specifically for edge devices, being able to process RAW 12MP photos directly on mobile phones under 1.5 second and producing high perceptual photo quality. To train and to evaluate the performance of the proposed solution, we use the real-world Fujifilm UltraISP dataset consisting on thousands of RAW-RGB image pairs captured with a professional medium-format 102MP Fujifilm camera and a popular Sony mobile camera sensor. The results demonstrate that the PyNET-V2 Mobile model can substantially surpass the quality of tradition ISP pipelines, while outperforming the previously introduced neural network-based solutions designed for fast image processing. Furthermore, we show that the proposed architecture is also compatible with the latest mobile AI accelerators such as NPUs or APUs that can be used to further reduce the latency of the model to as little as 0.5 second. The dataset, code and pre-trained models used in this paper are available on the project website: https://github.com/gmalivenko/PyNET-v2
- Europe > Switzerland > Zürich > Zürich (0.14)
- Africa > Mali (0.05)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (2 more...)
DF-Captcha: A Deepfake Captcha for Preventing Fake Calls
Social engineering (SE) is a form of deception that aims to trick people into giving access to data, information, networks and even money. For decades SE has been a key method for attackers to gain access to an organization, virtually skipping all lines of defense. Attackers also regularly use SE to scam innocent people by making threatening phone calls which impersonate an authority or by sending infected emails which look like they have been sent from a loved one. SE attacks will likely remain a top attack vector for criminals because humans are the weakest link in cyber security. Unfortunately, the threat will only get worse now that a new technology called deepfakes as arrived. A deepfake is believable media (e.g., videos) created by an AI. Although the technology has mostly been used to swap the faces of celebrities, it can also be used to `puppet' different personas. Recently, researchers have shown how this technology can be deployed in real-time to clone someone's voice in a phone call or reenact a face in a video call. Given that any novice user can download this technology to use it, it is no surprise that criminals have already begun to monetize it to perpetrate their SE attacks. In this paper, we propose a lightweight application which can protect organizations and individuals from deepfake SE attacks. Through a challenge and response approach, we leverage the technical and theoretical limitations of deepfake technologies to expose the attacker. Existing defence solutions are too heavy as an end-point solution and can be evaded by a dynamic attacker. In contrast, our approach is lightweight and breaks the reactive arms race, putting the attacker at a disadvantage.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Middle East > Israel (0.04)
- Europe > Russia (0.04)
- (4 more...)
The First Principles of Deep Learning and Compression
The deep learning revolution incited by the 2012 Alexnet paper has been transformative for the field of computer vision. Many problems which were severely limited using classical solutions are now seeing unprecedented success. The rapid proliferation of deep learning methods has led to a sharp increase in their use in consumer and embedded applications. One consequence of consumer and embedded applications is lossy multimedia compression which is required to engineer the efficient storage and transmission of data in these real-world scenarios. As such, there has been increased interest in a deep learning solution for multimedia compression which would allow for higher compression ratios and increased visual quality. The deep learning approach to multimedia compression, so called Learned Multimedia Compression, involves computing a compressed representation of an image or video using a deep network for the encoder and the decoder. While these techniques have enjoyed impressive academic success, their industry adoption has been essentially non-existent. Classical compression techniques like JPEG and MPEG are too entrenched in modern computing to be easily replaced. This dissertation takes an orthogonal approach and leverages deep learning to improve the compression fidelity of these classical algorithms. This allows the incredible advances in deep learning to be used for multimedia compression without threatening the ubiquity of the classical methods. The key insight of this work is that methods which are motivated by first principles, i.e., the underlying engineering decisions that were made when the compression algorithms were developed, are more effective than general methods. By encoding prior knowledge into the design of the algorithm, the flexibility, performance, and/or accuracy are improved at the cost of generality...
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Summary/Review (0.87)
- Leisure & Entertainment (1.00)
- Information Technology (1.00)
- Education (1.00)
- (2 more...)